class: center, middle, inverse, title-slide .title[ # Be Bayesian my friend. Bayesian inference - PART II ] .subtitle[ ## VI Jornadas Científicas de Estudiantes de la SEB ] .author[ ###
Joaquín Martínez-Minaya, 2022-09-15
VAlencia BAyesian Research Group
Statistical Modeling Ecology Group
Grupo de Ingeniería Estadística Multivariante
] .date[ ###
jmarmin@eio.upv.es
] --- #Table of contents ## 1. Hierarchical Bayesian models ## 2. Bayesian computation. MCMC methods ## 3. Bayesian computation. OpenBUGS --- class: inverse, center, middle, animated, slideInRight # 1. Hierarchical Bayesian models --- # Example: Scoring penalties Valencia C. F. .left-column9[ - Liga Santander is one of the famous league around the world. In this example, we use data of the last 10 seasons in order to know the chance of success `\((\pi)\)` to score a penalty for .hlb[Valencia Club de Fútbol]. ] .right-column9[  ] --- # Again we talk about football ## - We consider same experiment in .hlb[10 different teams] -- ## - How can we model this situation? and what can we conclude? -- ## - More generally, how can we incorporate .hlb[random effects]? --- # Three ways to do so ### 1. .hlbred[All teams have the same characteristics]. .left-column9[ - Apply a .hlbred[joint analysis] to all the teams. - The probability of score a penalty `\((\pi)\)` is the .hlbred[same in all teams]. - Observations are independent and identically distributed. `$$y_i \mid \pi \sim \text{Ber}(\pi)$$` `$$\pi \sim \text{Beta}(a,b)\,, \text{ with a and b fixed}$$` ] .right-column9[ .center[  ] ] --- # Three ways to do so ### 2. .hlbred[Each team is different and has nothing in common with the others]. .left-column9[ - Apply an analysis to .hlbred[each team separately]. - Assume a .hlbred[different proportion of presence] in each one: `\(\pi_j\)`, `\(j=1, \ldots, J\)`. In this case, `\(J = 10\)`. - Observations are independent but are .hlb[distributed differently in each team]. - .hlb[Likelihood] is different for each team. For each `\(j\)`, `$$y_{ij} \mid \pi_j \sim \text{Ber}(\pi_j)$$` `$$\pi_j \sim \text{Beta}(a_j, b_j)\,, \text{ with } a_j \text{ and } b_j \text{ fixed}$$` ] .right-column9[ .center[  ] ] --- # In view of the two possible modelings ### - Is it reasonable to assume .hlbred[the same proportion of presence] in all teams? -- ### - There are reasons to suggest that .hlbred[there is variability in those proportions]: - The teams do not behave the same way. - The observations of the same team are more similar among themselves than when they are from different teams. -- ### - Is it reasonable to think that .hlbred[there is no relationship between the proportions of presence] of the different teams? ## Although not identical, .hlbred[teams] are at least .hlb[similar]. --- # Three ways to do so ### 3. Consider a .hlbred[hierarchical model]. .left-column7[ - The parametric vector `\(\boldsymbol{\pi} = \pi_1, \ldots, \pi_{J}\)` is a .hlbred[random sampling from a common distribution] that depends on a vector of .hlbred[hyperparameters], `\(a\)` and `\(b\)`, partial or totally unknown. - The model `$$y_{ij} \mid \pi_j \sim \text{Ber}(\pi_j) \ , j= 1, \ldots, J$$` $$\pi_j \sim \text{Beta}(a, b) $$ $$a \sim p(a), \ b \sim p(b) $$ ] .right-column9[  ] --- # Numerical approaches - When applying Bayesian Statistics, most of the usual models do not yield to .hlbred[analytical expressions for the posterior] neither the posterior predictive distributions. -- - Most of the .hlb[complications that appear in the Bayesian methodology] come from the resolution of integrals that appear when applying the learning process: - The normalization constant of the posterior distribution, - moments and quantiles of the posterior, - credible regions, probabilities in the contrasts, etc. -- ### Solutions: - .hlb[Monte Carlo methods: MCMC]. - .hlb[INLA]. --- class: center, middle, animated, rotateInUpRight, inverse # 2. Bayesian computation. MCMC methods --- # Monte Carlo Methods .left-column9[ .hlbpurple[Monte Carlo Simulation] - Simulate means to .hlb[imitate a real process] - Obtain .hlb[realizations of a random variable] for which only its density function is (fully or partially) known. ```r x <- rnorm(1000, mean = 1, sd = 2) ``` <!-- --> ] -- .right-column9[ ## .hlbpurple[Monte Carlo Integration] - Computing the mean of a N(1, 2), - `\(E(X) = \int_{-\infty}^{\infty} x \cdot f(x) dx\)` - Using .hlb[Monte Carlo integration]: - Simulate from N(1, 2): `\(\phi^1, \ldots, \phi^N\)`. - Compute the mean of the simulated values: `\(E(X) \approx \frac{1}{N} \sum_{i=1}^N \phi^i\)` - Doing `summary` of the simulation, we compute more measures: | Min.| 1st Qu.| Median| Mean| 3rd Qu.| Max.| |------:|-------:|------:|-----:|-------:|-----:| | -6.414| -0.341| 1| 1.018| 2.38| 8.655| ] --- # Markov Chain Monte Carlo - A Markov chain is a .hlbred[stochastic sequence of numbers] where each value in the sequence depends only upon the last. -- - If `\(\phi^1, \phi^2, \ldots, \phi^N\)` is a sequence of numbers, then `\(\phi^2\)` is only a function of `\(\phi^1\)`, `\(\phi^3\)` of `\(\phi^2\)`, etc. -- - Under certain conditions, the distribution over the states of the .hlb[Markov chain] will .hlb[converge to a stationary distribution]. -- - The .hlb[stationary distribution is independent of the initial starting values] specified for the chains. -- - AIM: construct a Markov chain such that .hlb[the stationary distribution is equal to the posterior distribution] `\(p(\theta \mid x)\)`. -- - We combine Markov Chain with Monte Carlo simulation --> .hlbpurple[Markov chain Monte Carlo (MCMC)]. -- - They were proposed by first time in the Statistics area by <a href = "https://www.tandfonline.com/doi/abs/10.1080/01621459.1990.10476213"> Gelfand and Smith (1990) </a>. --- # Posterior distribution ## Estimating the probability to score a penalty ### - .hlb[Likelihood] $$p(\boldsymbol{y} \mid \pi) = \ell(\pi) = \pi^{k}(1-\pi)^{N-k} $$ -- ### - .hlb[Prior distribution] $$p(\pi) = \pi^{a-1}(1-\pi)^{b-1} $$ -- ### - .hlb[Posterior distribution] `$$p(\pi \mid \boldsymbol{y}) \propto p(\boldsymbol{y} \mid \pi) \times p(\pi) \propto \pi^{k + a - 1}(1 - \pi)^{N - k + b - 1}$$` --- # MCMC: Metropolis-Hastings (MH) 1. Starting value `\(\pi^{(0)}\)` 2. For `\(t = 1, \ldots, T\)` - .hlb[We define a proposal distribution] (Usually similar to the objective distribution). In this case, `\(q(\pi \mid \pi^{(t-1)}) \sim logit-N(\pi^{(t-1)}, \sigma = 0.5)\)`. - .hlb[Simulate] `\(\pi^{(prop)}\)` .hlb[from the proposal distribution] `\(q(\pi \mid \pi^{(t-1)})\)`. - Compute .hlbred[probability of acceptance]: `$$\alpha = \text{min}\left(1, \frac{p^*(\pi^{(prop)})q(\pi^{(t-1)} \mid \pi^{(prop)})}{p^*(\pi^{(t-1)})q(\pi^{(prop)} \mid \pi^{(t-1)})}\right)$$` - Generate a .hlb[random number] `\(u\)` from the Uniform(0, 1). - `\(\pi^{(t+1)} = \pi^{(prop)}\)`, if `\(u \geq \alpha\)`, - `\(\pi^{(t+1)} = \pi^{(t)}\)`, if `\(u< \alpha\)` 3. Finally, we .hlb[obtain] `\(\pi^{0}, \pi^{1}, \ldots, \pi^{T}\)` which is .hlb[a simulation of the posterior distribution]. --- # MCMC: Metropolis-Hastings (MH) .center[  ] --- # Approaching probability of score using MH .left-column9[ .hlbpurple[Visual Metropolis-Hastings]
- Play the video from .hlb[minute 4:44]. ] -- .right-column9[ .hlb[Tracing the chain. Is the chain autocorrelated?]  ] --- # MCMC. Burnin and thin .left-column9[  ] .right-column9[   ] --- class: center, middle, animated, rotateInUpRight, inverse # 3. Bayesian computation. OpenBUGS --- # OpenBUGS .left-column9[ - Open source program for Bayesian modelling that was developed from WinBUGS - It allows us to obtain MCMC simulations in a fairly intuitive way. - You can download it from https://www.mrc-bsu.cam.ac.uk/software/bugs/openbugs/ - There exist other softwares: - <a href= "https://cran.r-project.org/web/packages/MCMCpack/index.html"> MCMC pack </a> (With this R-package, you can use MCMC methods with similar notation as usually use in R) - <a href="https://mc-stan.org/" style="color:#FF0000;"> Stan </a> - <a href="https://r-nimble.org/" style="color:#FF0000;"> Nimble </a> - <a href="https://mcmc-jags.sourceforge.io/" style="color:#FF0000;"> Jags </a> - <a href="https://www.r-inla.org/" style="color:#FF0000;"> INLA </a> (it does not use MCMC methods, but it is very powerfull for doing Bayesian inference) ] .right-column9[  ] --- # Disease mapping - <a href = "https://www.sciencedirect.com/science/article/pii/S0167947306004531?casa_token=F3Lcx9eFKqcAAAAA:gZA2NIvVrBqCdZpqhkYHeQeVnvQTubjhzljohKuuvwi_Mlq53-_Rav0qpLzzJGRqcPCrQuaifzI> "Congdon (2007) </a> studies .hlb[suicide mortality in 32 London boroughs] (excluding the City of London) in the period 1989–1993 for male and female combined, using a .hlb[disease mapping model and an ecological regression model]. .left-column4[ - The variables are: - **N**: which contains the name of boroughs - **O**: the number of observed suicides in the period under study - **E**: the number of expected cases of suicides (E), - **x1**: an index of social deprivation, and - **x2**: an index of social fragmentation (X2), which represents the lack of social connections and of sense of community. ] .right-column4[ |NAME | y| E| x1| x2| |:--------------------|---:|-----:|-----:|-----:| |Barking and Dagenham | 75| 80.7| 0.87| -1.02| |Barnet | 145| 169.8| -0.96| -0.33| |Bexley | 99| 123.2| -0.84| -1.43| |Brent | 168| 139.5| 0.13| -0.10| |Bromley | 152| 169.1| -1.19| -0.98| |Camden | 173| 107.2| 0.35| 1.77| ] --- #Standarized Mortality Ratio (SMR): raw data .center[ ] --- # The model - A conditional independent .hlb[Poisson] likelihood function is assumed: `$$y_i \sim \text{Poisson}(\lambda_i), \ \ \lambda_i =E_i \rho_i, \ \ \log(\rho_i)=\eta_i \,\,, i=1, \ldots 32 \,\,$$` -- - We assume that `\(\eta_i = \beta_0 + \beta_1 x_1 + sp_i\)`, being `\(\beta_0\)` and `\(\beta_1\)` the parameters, and `\(\boldsymbol{sp}\)` the .hlb[spatially structured random effect]. -- - Gaussian priors are assigned to the parameters with mean 0 and `\(\sigma = 100\)`. -- - Gaussian distribution is assigned to `\(\boldsymbol{sp}\)`, `$$sp_i \mid \boldsymbol{sp}_{-i} \sim \mathcal{N} \left(\frac{1}{n_{i}} \sum_{i \sim j} sp_j, \frac{1}{n_{i} \tau_{\boldsymbol{sp}}}\right)\,\,.$$` -- - Note that `\(\boldsymbol{sp}= (sp_1, \ldots, sp_{32})\)` is a random effect which depends on the hyperparameter `\(\tau_{sp}\)`. -- - As `\(\sigma_{sp} = \frac{1}{\sqrt{\tau_{sp}}}\)`, we can assign an .hlb[hyperprior] to `\(\sigma_{sp}\)`. `\(sp \sim Uniform (0,10)\)`. --- # The model in OpenBUGS .left-column6[ .medium[ ```r cat("model{ for(i in 1:n){ O[i]~ dpois(mu[i]) log(mu[i]) <- log(E[i]) + beta0 + beta1*x1[i] + sp[i] R[i] <- exp(beta0 + beta1*x1[i] + sp[i]) prob[i] <- step(R[i]-1) } sp[1:n] ~ car.normal(adj[], w[], num[], precsp) beta0 ~ dnorm(0, 0.0001) beta1 ~ dnorm(0, 0.0001) precsp <- pow(sdsp,-2) sdsp ~ dunif(0,10) }", file = model.file) ``` ]] .right-column6[  ] --- # Posterior distributions of the parameters  --- # Is the variable deprivation relevant? ## Summary statistics ```r summary(mod1_mcmc)[[2]] %>% round(., 3)%>% .[1:2,] %>% knitr::kable(.) ``` | | 2.5%| 25%| 50%| 75%| 97.5%| |:-----|-----:|-----:|-----:|-----:|-----:| |beta0 | 0.008| 0.030| 0.040| 0.051| 0.072| |beta1 | 0.074| 0.126| 0.153| 0.179| 0.230| ## Computing `\(P(\beta_1 > 0)\)` ```r (mod1_mcmc[[1]][,"beta1"] > 0) %>% table(.) %>% c(.)/length(mod1_mcmc[[1]][,"beta1"]) ``` ``` ## TRUE ## 1 ``` --- # Posterior distributions of the hyperparameters  | 2.5%| 25%| 50%| 75%| 97.5%| |-----:|-----:|-----:|-----:|-----:| | 0.268| 0.332| 0.372| 0.417| 0.523| --- # Posterior distribution of the spatial effect .center[ ] --- # Posterior distribution for the SMR and P(SMR > 1) .center[ ] --- # What we have learned so far - ### The biggest challenge for Bayesian inference has always been the .hlb[computational power] that more complex models require. -- - ### .hlb[MCMC methods allow computationally estimating posterior distributions] that are analytically intractable. -- - ### Today, there are many, many teams of researchers developing computational techniques for computing a posteriori distributions. -- - ### For many areas of Statistics such as .hlb[Spatial Statistics], Bayesian methodology has been a turning point. --- # References .hlb[Books] - .hlb[OpenBUGS]: Cowles, M. K. (2013). Applied Bayesian statistics: with R and OpenBUGS examples (Vol. 98). Springer Science & Business Media. - .hlb[WinBUGS]: Ntzoufras, I. (2011). Bayesian modeling using WinBUGS. John Wiley & Sons. - .hlb[Stan]: Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin (2013). Bayesian Data Analysis. Chapman and Hall/CRC - .hlb[INLA]: Gómez-Rubio, V. (2020). Bayesian inference with INLA. CRC Press. - .hlb[Disease mapping]: Martínez-Beneito, M. A., & Botella-Rocamora, P. (2019). Disease mapping: from foundations to multidimensional modeling. CRC Press. --- #References .hlb[Blogs] - http://wlm.userweb.mwn.de/R/wlmRcoda.htm - https://rpubs.com/FJRubio/IntroMCMC --- class: inverse, left, middle, animated, rotateInUpLeft </br> # Be Bayesian my friend. Bayesian inference - PART II ## VI Jornadas Científicas de Estudiantes de la SEB </br> <font size="6"> Joaquín Martínez-Minaya, 2022-09-15 </font> </br> <a href="http://vabar.es/" style="color:white;" > VAlencia BAyesian Research Group </a> </br> <a href="https://smeg-bayes.org/ " style="color:white;"> Statistical Modeling Ecology Group </a> </br> <a href="https://giem.blogs.upv.es/" style="color:white;"> Grupo de Ingeniería Estadística Multivariante </a> </br> <a href="jmarmin@eio.upv.es" style="color:white;"> jmarmin@eio.upv.es </a> ---